Skip to content

Conversation

hdefazio
Copy link

@hdefazio hdefazio commented Oct 17, 2025

Depends on #371

This PR fixes a critical issue where the end-to-end test suite was failing in environments using Podman due to how kind loads container images.

What this PR does / why we need it:
Previously, our tests used kind load docker-image to load test images into the Kind cluster. While this works for Docker, it is unreliable with Podman, especially in rootless environments. The command would fail with errors like "image not present locally" or "stat -: no such file or directory" because the test runner could not correctly connect to the user's Podman session or handle piped image data.
This is a known issue with kind (see kubernetes-sigs/kind#2038 and kubernetes-sigs/kind#3105) and this pr implements the suggested workaround.

Testing:

$ make test-e2e
✅ Container tool 'podman' found.
==== Building Docker image ghcr.io/llm-d/llm-d-inference-scheduler:dev ====
podman build \
....
Successfully tagged ghcr.io/llm-d/llm-d-inference-scheduler:dev
b8a33690807f06e92527fb1d04328cefaea6d225e2fb9b996343e1c5be0cac35

==== Pulling Docker images ====
./scripts/pull_images.sh
Using container tool: podman
--- Using the following images ---
Scheduler Image:     ghcr.io/llm-d/llm-d-inference-scheduler:dev
Simulator Image:     ghcr.io/llm-d/llm-d-inference-sim:latest
Sidecar Image:       ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
----------------------------------------------------
Pulling dependencies...
...
==== Running End to End Tests ====
./test/scripts/run_e2e.sh
Running end to end tests
  "level"=0 "msg"="Successfully loaded environment variable" "key"="CONTAINER_TOOL" "value"="podman"
  "level"=0 "msg"="Successfully loaded environment variable" "key"="EPP_IMAGE" "value"="ghcr.io/llm-d/llm-d-inference-scheduler:dev"
  "level"=0 "msg"="Successfully loaded environment variable" "key"="VLLM_SIMULATOR_IMAGE" "value"="ghcr.io/llm-d/llm-d-inference-sim:latest"
  "level"=0 "msg"="Successfully loaded environment variable" "key"="ROUTING_SIDECAR_IMAGE" "value"="ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0"
  "level"=0 "msg"="Environment variable not set, using default value" "key"="EXISTS_TIMEOUT" "defaultValue"="30s"
  "level"=0 "msg"="Environment variable not set, using default value" "key"="READY_TIMEOUT" "defaultValue"="3m0s"
  "level"=0 "msg"="Environment variable not set, using default value" "key"="MODEL_READY_TIMEOUT" "defaultValue"="10m0s"
=== RUN   TestEndToEnd
Running Suite: End To End Test Suite - 
==============================================================================================================
Random Seed: 1760660772

Will run 3 of 3 specs
------------------------------
[BeforeSuite] 
  enabling experimental podman provider
  Creating cluster "e2e-tests" ...
  Set kubectl context to "kind-e2e-tests"
  You can now use your cluster with:

  kubectl cluster-info --context kind-e2e-tests

  Thanks for using kind! 😊
  STEP: Loading image into Kind cluster: ghcr.io/llm-d/llm-d-inference-sim:latest @ 10/16/25 20:26:34.463
  "level"=0 "msg"="Podman detected, using image-archive method." "path"="/usr/bin/podman"
  Copying blob sha256:778d8c610941586099cac6c507cad2d1156b71b2bb54c42cebedf8808c68edb9
  Writing manifest to image destination
  enabling experimental podman provider
  STEP: Loading image into Kind cluster: ghcr.io/llm-d/llm-d-inference-scheduler:dev @ 10/16/25 20:26:40.428
  "level"=0 "msg"="Podman detected, using image-archive method." "path"="/usr/bin/podman"
  Copying blob sha256:004d2c90a65694c2830b06fddc1047d40063c6cb36fb31a5a3edfce9435326c6
  Writing manifest to image destination
  enabling experimental podman provider
  STEP: Loading image into Kind cluster: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0 @ 10/16/25 20:26:47.652
  "level"=0 "msg"="Podman detected, using image-archive method." "path"="/usr/bin/podman"
  Copying blob sha256:ff59c129bdee8355d5b47559167f5f7c893dc99d9779a2b3194fa59152e90110
  Writing manifest to image destination
  enabling experimental podman provider
[BeforeSuite] PASSED [48.537 seconds]
...
Ran 3 of 3 Specs in 116.816 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestEndToEnd (116.82s)
PASS
ok      github.com/llm-d/llm-d-inference-scheduler/test/e2e     116.833s

@hdefazio hdefazio marked this pull request as draft October 17, 2025 00:07
@hdefazio hdefazio changed the title Fix the e2e tests so they work with podman Fix Image Loading for Podman in E2E Tests Oct 17, 2025
containers:
- name: epp
image: ghcr.io/llm-d/llm-d-inference-scheduler:latest
image: ${EPP_IMAGE}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The file as it was allows the YAML file to be used outside of the kind based tests.


images:
- name: ghcr.io/llm-d/llm-d-inference-scheduler
newTag: ${EPP_TAG}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The file as it was allows the YAML file to be used outside of the kind based tests.

initContainers:
- name: routing-sidecar
image: ghcr.io/llm-d/llm-d-routing-sidecar:latest
image: ${ROUTING_SIDECAR_IMAGE}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The file as it was allows the YAML file to be used outside of the kind based tests.

containers:
- name: vllm
image: ghcr.io/llm-d/llm-d-inference-sim:latest
image: ${VLLM_SIMULATOR_IMAGE}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The file as it was allows the YAML file to be used outside of the kind based tests.

- name: ghcr.io/llm-d/llm-d-inference-sim
newTag: ${VLLM_SIMULATOR_TAG}
- name: ghcr.io/llm-d/llm-d-routing-sidecar
newTag: ${ROUTING_SIDECAR_TAG}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change. The file as it was allows the YAML file to be used outside of the kind based tests.

if docker image inspect ${VLLM_SIMULATOR_IMAGE} > /dev/null 2>&1; then
echo "INFO: Loading image into KIND cluster..."
kind --name ${CLUSTER_NAME} load docker-image ${IMAGE_REGISTRY}/${VLLM_SIMULATOR_IMAGE}:${VLLM_SIMULATOR_TAG}
kind --name ${CLUSTER_NAME} load docker-image ${VLLM_SIMULATOR_IMAGE}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change due to other changes that are requested to be undone.

# Load the ext_proc endpoint-picker image into the cluster
if [ "${CONTAINER_RUNTIME}" == "podman" ]; then
podman save ${IMAGE_REGISTRY}/${EPP_IMAGE}:${EPP_TAG} -o /dev/stdout | kind --name ${CLUSTER_NAME} load image-archive /dev/stdin
podman save ${EPP_IMAGE} -o /dev/stdout | kind --name ${CLUSTER_NAME} load image-archive /dev/stdin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change due to other changes that are requested to be undone.

podman save ${EPP_IMAGE} -o /dev/stdout | kind --name ${CLUSTER_NAME} load image-archive /dev/stdin
else
kind --name ${CLUSTER_NAME} load docker-image ${IMAGE_REGISTRY}/${EPP_IMAGE}:${EPP_TAG}
kind --name ${CLUSTER_NAME} load docker-image ${EPP_IMAGE}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change due to other changes that are requested to be undone.

| envsubst '${POOL_NAME} ${MODEL_NAME} ${MODEL_NAME_SAFE} ${EPP_NAME} ${EPP_TAG} ${VLLM_SIMULATOR_TAG} \
${PD_ENABLED} ${KV_CACHE_ENABLED} ${ROUTING_SIDECAR_TAG} \
| envsubst '${POOL_NAME} ${MODEL_NAME} ${MODEL_NAME_SAFE} ${EPP_NAME} ${EPP_IMAGE} ${VLLM_SIMULATOR_IMAGE} \
${PD_ENABLED} ${KV_CACHE_ENABLED} ${ROUTING_SIDECAR_IMAGE} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change due to other changes that are requested to be undone.


# Default image registry for pulling deployment images
export IMAGE_REGISTRY="${IMAGE_REGISTRY:-ghcr.io/llm-d}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change due to other changes that are requested to be undone.

@elevran elevran moved this from In review to In progress in llm-d-inference-scheduler Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants